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Abstract 

The increasing amount of online content mo¬ 
tivated the development of multi-document 
summarization methods. In this work, we 
explore straightforward approaches to extend 
single-document summarization methods to 
multi-document summarization. The pro¬ 
posed methods are based on the hierarchical 
combination of single-document summaries, 
and achieves state of the art results. 


1 Introduction 


The use of the Internet to fulfill generic informa¬ 
tion needs motivated pioneer multi-document sum¬ 


marization efforts as NewsInEssence (Radev et al. 


2005| ) or Newsblaster ( jMcKeown et al., 2002[ ), on¬ 
line since 2001. In general, multi-document sum¬ 
marization approaches have to address two differ¬ 
ent problems: passage selection and information or¬ 
dering. Current multi-document systems adopt, for 
passage selection, approaches similar to the ones 
used in single-document summarization, and use the 
chronological order of the documents for informa¬ 


tion ordering (Christensen et al., 2013). The prob¬ 
lem is that most approaches fail to generate sum¬ 
maries that cover generic topics which comprehend 
different, equally important, subtopics. 

We propose to extend a state-of-the-art 
single-document summarization method, KP- 
Centrality ( Ribeiro et al., 2013[ ), capable of 
focusing on diverse important topics while ignoring 
unimportant ones, to perform multi-document sum¬ 
marization. We explore two hierarchical strategies 
to perform this extension. 


This document is organized as follows: Sect.[2]ad- 
dresses the related work; Sect. [3]presents our multi¬ 
document summarization appproach; experimental 
results close the paper. 


2 Related Work 


Most of the current work in automatic summariza¬ 
tion focuses on extractive summarization. The most 
popular baselines for multi-document summariza¬ 
tion fall into one of the following general mod¬ 
els: Centrality-based (Radev et al., 2004f |Erkan 


and Radev, 2004:; Wang et al., 2008| Ribeiro and 


de Matos ,~~2011| , Maximal Marginal Relevance 
(MMR) ( Carbonell and Goldstein, 1998| Guo and 


Sanner, 2010; Sanner et al., 2011; Lim et a l,, 2012 1 , 
and Coverage-base methods ( Lin and llovy, 20001 
Sipos et al., 2012). Additionally, methods such as 
KP-Centrality ( |Ribeiro et al., 20T3] ), which is 
centrality and coverage-based, follow more than one 
paradigm. In general, Centrality-based models are 
used to produce generic summaries, while the MMR 
family generates query-oriented ones. Coverage- 
base models produce summaries driven by words, 
topics or events. 

Centrality-as-relevance methods base the detec¬ 
tion of the most salient passages on the identification 
of the central passages of the input source(s). One of 
the main representatives of this family is Passage- 
to-Centroid Similarity-based Centrality. Centroid- 
based methods build on the idea of a pseudo-passage 
that represents the central topic of the input source— 
the centroid —selecting as passages to be included in 
the summary the ones that are close to the centroid. 
Another approach to centrality estimation is to com- 











































pare each candidate passage to every other passage 
and select the ones with higher scores (the ones that 
are closer to every other passage): the Pair-wise Pas¬ 
sage Similarity-based Centrality. 


MMR ( jCarbonell and Goldstein, 1998) is a query 
driven relevance model based on the following 
mathematical model: 

argmax X(Simi(Si, Q)) — (1—A)(max Sirri 2 {Si , Sj)) 
Si L Sj 

where Sim 1 and Sirn2 are similarity metrics that 
do not have to be different; S t are the yet unselected 
passages and Sj are the previously selected ones; Q 
is the required query to apply the model; and, A is 
a parameter that allows to configure the result to be 
from a standard relevance-ranked list (A = 1) to a 
maximal diversity ranking (A = 0). 

Coverage-based summarization defines a set of 
concepts that need to occur in the sentences selected 


for the summaries. The concepts are events (Filatova 


and Flatzivassiloglou, 2004), topics (Lin and Hovy, 


20001, salient words (Lin and Bilmes, 20l0; Sipos 


et al., 2012), and word n-grams (Gillick et al., 2008 


Almeida and Martins^O 13). 


3 Multi-Document Summarization 


Our multi-document approach is built upon a cen¬ 
trality and coverage-based single-document summa¬ 
rization method, KP-Centrality ( [Ribeiro et al.,| 
2013| ). This method, through the use of key phrases, 
is easily adaptable and has been shown to be robust 
in the presence of noisy input. This is an important 
aspect considering that using as input several docu¬ 
ments frequently increases the amount of unimpor¬ 
tant content). 

When adapting a single-document summarization 
method to perform multi-document summarization, 
a possible strategy is to combine the summaries of 
each document. To iteratively combine the sum¬ 
maries, we explore two different approaches: single¬ 
layer hierarchical and waterfall. Given that the sum¬ 
marization method also uses as input a set of key 
phrases, we extract from each input document the 
required set of key phrases, join the extracted sets, 
and rank the key phrases using their frequency. To 
generate each summary, we use the top key phrases, 
excluding the ones that do not occur in the input doc¬ 
ument. 


3.1 Single-Document Summarization Method 

To retrieve the most important sentences of an in¬ 
formation source, we used the KP-Centrality 
method ( [Ribeiro et al., 2013 ). We chose this model 
for its adaptability to different types of information 
sources (e.g., text, audio and video), while support¬ 
ing privacy (Marujo et al., 2014| ), and offering state- 
of-art performance. It is based on the notion of com¬ 
bining key phrases with support sets. A support set 
is a group of the most semantically related passages. 
These semantic passages are chosen using heuristics 


based on the passage order method (Ribeiro and de 


Matos, 20111. This type of heuristics uses the struc¬ 


ture of the input document (source) to partition the 
candidate passages to be included in the support set 
in two subsets: the ones closer to the passage asso¬ 
ciated with the support set under construction and 
the ones further apart. These heuristics use a per¬ 
mutation, d\,d l 2 ,--- , d l N _ i> of the distances of the 
passages to the passage p t , related to the support 
set under construction, with d\ = dist(sk,Pi), 1 < 
k < N — 1, where N is the number of passages, cor¬ 
responding to the order of occurrence of passages Sk 
in the input source. The metric that is normally used 
is the cosine distance. 

The KP-Centrality method consists of two steps. 
First, it extracts key phrases using a supervised ap¬ 
proach (Marujo et al., 2012) and combines them 
with a bag-of-words model in a compact matrix rep¬ 
resentation, given by: 

w(ti,pi) ... w(ti,p N ) w(t i, ki) ... w(ti, k M ) 


w(t T ,pi) ... w(t T ,PN ) w{t T , h)... w{t T , k M ) 

( 1 ) 

where in is a function of the number of occur¬ 
rences of term t, in passage pj or key phrase /q, 
T is the number of terms and M is the number of 
key phrases. Then, using a segmented information 
source / = Pi,P 2 , ■ ■ ■ ,Pn, a support set Si is com¬ 
puted for each passage pj using: 

Si = {s € I U K : sim(s, qi) > e, A s / q*}, (2) 

for i = 1,..., N 4- M. Passages are ranked exclud¬ 
ing the key phrases K ( artificial passages ) accord¬ 
ing to: 

argmax |{Sj : s G Si}\. (3) 

se(u ? = 1 Si)-K 






















































3.2 Single-Layer Hierarchical 

In this model, we use KP-Centrality to generate, 
for each news document, an intermediate summary 
with the same size of the output summary for the in¬ 
put documents. An aggregated summary is obtained 
by concatenating the chronologically ordered inter¬ 
mediate summaries. The output summary is again 
generated by applying KP-Centrality to the ag¬ 
gregated summary as Figure [ljshows. 

■ 
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Figure 1: Single-layer architecture. 

3.3 Waterfall 

This model differs from the previous one in the 
merging process. The underlying merging of the 
documents follows a cascaded process: it starts by 
merging the intermediate summaries, with the same 
size of the output summary, of the first two docu¬ 
ments, according to their chronological order. This 
document is then summarized and merged with the 
summary of following document. We iterate this 
process through all the documents until the most re¬ 
cent one as Figure [2]illustrates. 



Figure 2: Waterfall architecture. 


4 Experimental Results 


We compare the performance of our methods against 
other representative models, namely MEAD, MMR, 
Expected n-call@k (Lim et al., 2012), and the Port¬ 
folio Theory (Wang and Zhu, 2009). MEAD is a 
centroid-based method and one of the most popu¬ 
lar centrality-based methods. MMR is one of the 
most used query-based methods. Expected n-call@k 
adapts and extends MMR as a probabilistic model 
(Probabilistic Latent MMR). The Portfolio Theory 
also extends MMR based on the idea of ranking un¬ 
der uncertainty. As baseline, we used the straight¬ 
forward idea of combining all input documents into 
a single one, and then submit the document to the 
single-document summarization method. Consider¬ 
ing that most coverage-based systems explore event 
information, we opted for not including them in this 
comparative analysis. 

To assess the informativeness of the summaries 
generated by our methods, we used ROUGE-1 and 
ROUGE-2 ( Lin, 20041 ) on DUC 2007 and TAC 2009 


datasets. The main summarization task in DUC 
200^j] is the generation of 250-word summaries of 
45 clusters of 25 newswire documents (from the 
AQUAINT coipus) and 4 human reference sum¬ 
maries. The TAC 2009 Summarization tasl0has 44 
topic clusters. Each topic has 2 sets of 10 news docu¬ 
ments obtained from the AQUAINT 2 corpus.There 
are 4 human 100-word reference summaries for each 
set, where the reference summaries for the first set 
are query-oriented, and for the second set are update 
summaries. In this work, we used the first set of ref¬ 
erence summaries. We evaluate the different models 
by generating summaries with 250 words. We only 
present the best results. 

The used features include the bag-of-words model 
representation of the sentences (TF-IDF), the key 
phrases and the query (obtained from the topics de¬ 
scriptions). Including the query is a new exten¬ 
sion to the KP-Centrality method, which, in 
general, improved the results. We experimented 
with different numbers of key phrases, obtaining 
the best results with 40 key phrases. To compare 
and rank the sentences, we use several distance met¬ 
rics, namely: Fracl33 (generic Minkowski distance. 


1 http://www-nlpir.nist.gov/projects/duc/duc2007/tasks.html 

2 http://www.nist.gov/tac/2009/Summarization/ 

















DUC 2007 

TAC 2009 

Distance 

Model 

Rl 

R2 

Rl 

R2 

fracl33 

baseline 

0.3565 

0.0744 

0.4706 

0.1268 

cosine 

0.3406 

0.0670 

0.4746 

0.1391 

fracl33 

waterfall 

0.3569 

0.0765 

0.4943 

0.1441 

fracl33 

single-layer 

0.3775 

0.0882 

0.4983 

0.1526 

cosine 

waterfall 

0.3701 

0.0904 

0.5137 

0.1693 

cosine 

single-layer 

0.3707 

0.0822 

0.4993 

0.1590 

fracl33 

single-layer (shuffle) 

0.3689 

0.0807 

0.5060 

0.1483 

cosine 

waterfall (shuffle) 

0.3626 

0.0844 

0.5107 

0.1630 


MEAD 

0.3282 

0.0765 

0.4153 

0.0845 


MMR 

0.3269 

0.0780 

0.3917 

0.0801 


E.n-call@k 

0.3209 

0.0701 

0.3873 

0.0699 


Portfolio 

0.3595 

0.0792 

0.4292 

0.0758 


LexRank 

0.2881 

0.0534 

0.3845 

0.0623 


Table 1: ROUGE-1 (Rl) and ROUGE-2 (R2) scores. 


with N = 1.(3)), Euclidean, Chebyshev, Manhat¬ 
tan, Minkowski, the Jensen-Shannon Divergence, 
and the cosine similarity. Table [T] shows that the 
best results were obtained by the proposed hierar¬ 
chical models, in both datasets. Overal, the best 
performing distance metric for our centrality-based 
method was the cosine similarity and the best strat¬ 
egy for combining the information was the water¬ 
fall approach, namely, in terms of ROUGE-2. In 
DUC 2007, fracl33 using the single-layer method 
achieved the best ROUGE-1 score, although the dif¬ 
ference for cosine is hardly noticeable. Single-layer 
with fracl33 shows a performance improvement 
of 0.0180 ROUGE-1 points (relative performance 
improvement of 5.0%) over the best of the other 
systems. Portfolio, in DUC 2007, and of 0.0845 
ROUGE-1 points (19.7% relative performance im¬ 
provement) in TAC 2009. In terms of ROUGE- 
2, the waterfall method using cosine achieved an 
improvement of 0.0112 (relative performance im¬ 
provement of 14.1%) over Portfolio, in DUC 2007, 
and of 0.0848 (relative performance improvement 
of 100.4%) over MEAD, the best performing of the 
reference systems using this metric, in TAC 2009. 
Note that our baseline obtained results similar to the 
best reference system in DUC 2007 and better re¬ 
sults than all reference systems in TAC 2009 (0.0454 
ROUGE-1 points corresponding to a 10.6% rela¬ 
tive performance improvement; 0.0546 ROUGE-2 


points corresponding to a 64.6% relative perfor¬ 
mance improvement). The better results obtained on 
the TAC 2009 dataset are due to the small size of 
the reference summaries and to the fact that the doc¬ 
uments sets to be summarized contain topics with 
higher diversity of subtopics. 

The shuffle results included in Table Q] are aver¬ 
ages of 10 trials. They arc lower than the other ob¬ 
tained using the documents organized in chronolog¬ 
ical order. This suggests that the order of the input 
documents is important to the summarization meth¬ 
ods. 

Figure |3]shows an example of summary produced 
by our multi-document method. The figure also in¬ 
cludes the respective reference summary for com¬ 
parison. 

5 Conclusions and Future Work 

In this work, we explore two different approaches to 
extend a single-document summarization method to 
multi-document summarization: single-layer hierar¬ 
chical and waterfall. 

Experimental results show that the proposed ap¬ 
proaches perform better than previous state-of-the- 
art methods on standard datasets used to evaluate 
this task. In general, the best performing approach is 
the waterfall approach using the cosine similarity. In 
fact, this configuration achieves the best results on 
the TAC 2009 dataset, considering both ROUGE-1 








Generated Summary: 


President Bill Clinton said Friday he will appeal a fed¬ 
eral judge’s ruling that struck down a law giving the pres¬ 
ident the power to veto specific items in bills passed by 
Congress. The law, passed by Congress last year, allowed 
the president for the first time to veto particular items in 
spending bills and certain limited tax provisions passed 
by Congress. Clinton said the funding that Congress 
has added to the bill is excessive and threatened to veto 
some items by using the line-item veto power. The White 
House said that the president used his authority to can¬ 
cel projects that were not requested in the budget and 
would not substantially improve the quality of life of mil¬ 
itary service members. Judge Thomas Hogan ruled that 
the law - which gives the president the power to strike 
items from tax and spending measures without vetoing 
the entire bill - violates the traditional balance of pow¬ 
ers between the various branches of government "’The 
Line-Item Veto Act is unconstitutional because it imper¬ 
missibly disrupts the balance of powers among the three 
branches of government,” said Thomas Hogan.” In its ap¬ 
peal, the Justice Department argues that the new chal¬ 
lengers also do not have standing to challenge the law, 
and that in any case the law is in line with the historic 
relationship between Congress and the president. 


Reference summary: 


Congress passed a law authorizing the line item veto 
(LIV) in 1996 accepting arguments that the measure 
would help preserve the integrity of federal spending by 
allowing the president to strike unnecessary spending and 
tax items from legislation thus encouraging the govern¬ 
ment to live within its means. It was considered in line 
with the historic relationship between Congress and the 
president and would provide a tool for eliminating waste¬ 
ful pork barrel spending while enlivening debate over the 
best use of funds. It was argued that the LIV would rep¬ 
resent presidential exercise of spending authority dele¬ 
gated by Congress. President Clinton exercised the LIV 
on 82 items in 1997 saving $1.9 billion in spending pro¬ 
jected over five years. The affected items were projects 
for specific localities, many in the area of military con¬ 
struction. which had been added to the president’s budget 
by Congress. The first court ruling on the LIV act was in 
U.S. District Court when in February 1998 it was ruled 
unconstitutional on the grounds that it violated the sep¬ 
aration of powers. The Department of Justice appealed 
that decision and in June 1998 the Supreme Court ruled 
the LIV act unconstitutional but on the grounds that it vi¬ 
olated Article I, 7, Clause 2 (The ’"presentment clause”) 
of the Constitution that establishes the process by which 
a bill becomes law. President Clinton expressed his deep 
disappointment. 


Figure 3: Example of summary produced by our summa- 
rizer and the reference summary Topic D0730G of DUC 
2007 


and ROUGE-2 metrics, and, although not achieving 
the best results in the DUC 2007 dataset, in terms of 
ROUGE-1, it also achieves a performance improve¬ 
ment over Portfolio of 0.0106 ROUGE-1 points (rel¬ 
ative performance improvement of 3%). 

In future work, we aim to adapt the proposed 
multi-document summarization method to perform 
abstractive summarization. 
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